Skip to content

feat: persistent-zval helpers (deep-copy zval trees across threads)#2366

Open
nicolas-grekas wants to merge 1 commit intophp:mainfrom
nicolas-grekas:persistent-zval-helpers
Open

feat: persistent-zval helpers (deep-copy zval trees across threads)#2366
nicolas-grekas wants to merge 1 commit intophp:mainfrom
nicolas-grekas:persistent-zval-helpers

Conversation

@nicolas-grekas
Copy link
Copy Markdown
Contributor

Second step of the split suggested in #2287: land the persistent-zval
subsystem as a standalone, reviewable header, independent of background
workers. This is the subsystem most likely to hide latent refcount or
memory-lifetime bugs; reviewing it in isolation is higher-signal than
finding issues inside a 3k-line diff.

What

  • persistent_zval.h (renamed from the bg_worker_vars.h draft,
    prefix dropped for generality):

    • persistent_zval_validate — whitelist (scalars, arrays of allowed
      values, enum instances). Everything else fails fast.
    • persistent_zval_persist — deep-copy request → persistent (pemalloc)
      memory. Fast paths baked in: interned strings shared, opcache-
      immutable arrays passed by pointer without copying or owning.
    • persistent_zval_free — deep-free; skips interned strings and
      immutable arrays (borrowed, not owned).
    • persistent_zval_to_request — deep-copy persistent → fresh request
      memory. Enums re-resolved by class + case name on each read.
  • frankenphp.c: header included only when FRANKENPHP_TEST_HOOKS is
    defined. First real consumer (background workers) drops the guard.

  • Test hook gated on FRANKENPHP_TEST_HOOKS:

    • PHP function frankenphp_test_persist_roundtrip(mixed): mixed runs
      validate → persist → to_request → free and returns the result.
    • Registered via zend_register_functions at MINIT so it never
      appears in ext_functions[] and never ships in production builds.
  • CI workflows set -DFRANKENPHP_TEST_HOOKS in CGO_CFLAGS
    (tests.yaml + sanitizers.yaml). windows.yaml is the release
    build, not a test runner, and stays untouched.

Notes

  • Build verified both without the flag (production path, no
    unused-function warnings) and with it (test path).
  • The FRANKENPHP_TEST_HOOKS guard around the header include goes
    away in the PR that lands the first real caller; the test hook
    itself goes away in that same step once end-to-end tests cover the
    code paths.

Second step of the split suggested in php#2287: land the persistent-zval
subsystem as a standalone, reviewable header, independent of background
workers. This is the subsystem most likely to hide latent refcount or
memory-lifetime bugs; reviewing it in isolation is higher-signal than
finding issues inside a 3k-line diff.

## What

- persistent_zval.h (renamed from the bg_worker_vars.h draft, prefix
  dropped for generality):
  - persistent_zval_validate: whitelist (scalars, arrays of allowed
    values, enum instances). Everything else fails fast.
  - persistent_zval_persist: deep-copy request -> persistent (pemalloc)
    memory. Fast paths baked in: interned strings shared, opcache-
    immutable arrays passed by pointer without copying or owning.
  - persistent_zval_free: deep-free; skips interned strings and
    immutable arrays (borrowed, not owned).
  - persistent_zval_to_request: deep-copy persistent -> fresh request
    memory. Enums re-resolved by class + case name on each read.

- frankenphp.c: header included only when FRANKENPHP_TEST_HOOKS is
  defined. First real consumer (background workers) drops the guard.

- Test hook gated on FRANKENPHP_TEST_HOOKS:
  - PHP function frankenphp_test_persist_roundtrip(mixed): mixed runs
    validate -> persist -> to_request -> free and returns the result.
  - Registered via zend_register_functions at MINIT so it never
    appears in ext_functions[] and never ships in production builds.

- CI workflows set -DFRANKENPHP_TEST_HOOKS in CGO_CFLAGS
  (tests.yaml + sanitizers.yaml). windows.yaml is the release build,
  not a test runner, and stays untouched.

## Notes

- Build verified both without the flag (production path, no
  unused-function warnings) and with it (test path).
- The FRANKENPHP_TEST_HOOKS guard around the header include goes
  away in the PR that lands the first real caller; the test hook
  itself goes away in that same step once end-to-end tests cover the
  code paths.
Copy link
Copy Markdown
Contributor Author

@nicolas-grekas nicolas-grekas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR ready, I made some minor perf optims/cleanups FYI, nothing substantial.

@zeriyoshi
Copy link
Copy Markdown
Contributor

zeriyoshi commented May 3, 2026

Hi, I'm currently working on an OPcache-integrated user/static cache implementation, and I'm planning to submit an RFC and implementation soon.

The implementation is not FrankenPHP-specific: it works under both NTS and ZTS, including FrankenPHP. Instead of using per-thread storage, it uses OPcache-managed shared-memory backends, so cached data can be shared across threads/processes while keeping request isolation. That is the main reason I think it may also fit the requirements of this PR.

At a high level, the implementation adds separate OPcache-managed shared-memory backends for:

  • an explicit user cache, configured by opcache.user_cache_memory_consumption
  • a stricter global/static cache, configured by opcache.global_cache_memory_consumption

It also adds explicit cache APIs and static-cache attributes, including:

  • \OPcache\user_store(), \OPcache\user_fetch(), \OPcache\user_clear()
  • \OPcache\global_store(), \OPcache\global_fetch(), \OPcache\global_clear()
  • and a few additional helper functions
  • #[\OPcache\CachedStatic] attribute
  • #[\OPcache\GlobalStatic] attribute

There are additional helper APIs in the current implementation as well, such as exists/delete/bulk-store/atomic-increment/status APIs, but the above are the main pieces relevant to this discussion.

The storage path uses zero-copy/shared-memory representations where possible. Scalars are stored directly, eligible arrays and supported object graphs can use a shared-graph representation, and values that cannot use that path fall back to OPcache's binary serializer and, where needed, PHP serialization. User-defined objects and a vetted set of internal objects are supported, while resources and closures are rejected.

One important semantic detail: \OPcache\user_store() itself does not track object graphs after storing. It stores the value as a snapshot at the time of the call. If the original object is mutated afterwards, that mutation is not automatically reflected in the cached value. The recursive object mutation tracking described below applies to #[\OPcache\CachedStatic], not to explicit user_store() entries.

The difference between the two attribute modes is:

  • CachedStatic / user cache backend: APCu-like, volatile behavior. Values are published through the user-cache backend, and object graphs assigned to cached static state can be tracked and published at request shutdown. On memory pressure, the implementation may evict expired entries, and can clear the volatile user cache as a whole when doing so would create enough space for the pending payload.
  • GlobalStatic / global cache backend: stricter store-point snapshot behavior. Object graphs are not recursively tracked after assignment; subsequent object-property mutations are request-local unless the static root is assigned again. Because static-property root assignments are published immediately, serialization/capacity failures are detected at the assignment/store point where possible. For static attribute publication failures, this is treated as a hard failure rather than silently evicting unrelated global state.

The benchmark results below are from a prime-then-read HTTP workload (steady-state): each measured row is warmed up first, then measured with 5000 cache-hit operations per request. The measured requests had a 100% cache-hit ratio, with no build/store work included in the timing samples. I will publish the benchmark workload and raw payload-level results together with the RFC/PR.

Current benchmark results on FrankenPHP:

$ ./benchmark_on_container.sh --php-repo ../ --iterations 30 --warmup 5 --runtime frankenphp --frankenphp-ref main --frankenphp-threads 5
FrankenPHP (ZTS) (5000 operations/request)
- apcu_store / apcu_fetch: 84.248 ms/request (16.85 us/op)
- user_store / user_fetch: 4.007 ms/request (0.80 us/op, 95.25% faster than APCu)
- global_store / global_fetch: 4.000 ms/request (0.80 us/op, 95.25% faster than APCu)
- Class #[\OPcache\CachedStatic]: 41.795 ms/request (8.36 us/op, 50.39% faster than APCu)
- Property #[\OPcache\CachedStatic]: 0.637 ms/request (0.13 us/op, 99.23% faster than APCu)
- Method #[\OPcache\CachedStatic]: 0.844 ms/request (0.17 us/op, 98.99% faster than APCu)
- Class #[\OPcache\GlobalStatic]: 6.794 ms/request (1.36 us/op, 91.93% faster than APCu)
- Property #[\OPcache\GlobalStatic]: 0.640 ms/request (0.13 us/op, 99.23% faster than APCu)
- Method #[\OPcache\GlobalStatic]: 0.855 ms/request (0.17 us/op, 98.99% faster than APCu)

For reference, NTS php-fpm + nginx:

$ ./benchmark_on_container.sh --php-repo ../ --iterations 30 --warmup 5 --runtime fpm
php-fpm + nginx (NTS) (5000 operations/request)
- apcu_store / apcu_fetch: 83.445 ms/request (16.69 us/op)
- user_store / user_fetch: 4.010 ms/request (0.80 us/op, 95.21% faster than APCu)
- global_store / global_fetch: 4.014 ms/request (0.80 us/op, 95.21% faster than APCu)
- Class #[\OPcache\CachedStatic]: 43.176 ms/request (8.64 us/op, 48.23% faster than APCu)
- Property #[\OPcache\CachedStatic]: 0.600 ms/request (0.12 us/op, 99.28% faster than APCu)
- Method #[\OPcache\CachedStatic]: 0.799 ms/request (0.16 us/op, 99.04% faster than APCu)
- Class #[\OPcache\GlobalStatic]: 6.442 ms/request (1.29 us/op, 92.27% faster than APCu)
- Property #[\OPcache\GlobalStatic]: 0.600 ms/request (0.12 us/op, 99.28% faster than APCu)
- Method #[\OPcache\GlobalStatic]: 0.801 ms/request (0.16 us/op, 99.04% faster than APCu)

In aggregate, this implementation is significantly faster than APCu in this read-heavy workload, not only on ZTS/FrankenPHP but also on a traditional NTS php-fpm + nginx setup. There are still workload-specific trade-offs, especially for small objects and serializer-fallback cases, so I intend to publish the raw per-payload benchmark data as part of the RFC.

If there is interest in this direction, please let me know. I'll prioritize getting the RFC text, implementation PR, and benchmark suite ready for review.

@nicolas-grekas
Copy link
Copy Markdown
Contributor Author

Thanks that's interesting. Not for this PR at this stage since we're looking for something that fits a simpler "message" style behavior, and also something that can be used on older versions of PHP.
But having something better that apcu is definitely worth experimenting!
Check the deepclone extension BTW!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants